{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h1><center> Additive models</center></h1>\n",
    "# 1. Basic\n",
    "A simple way to create a non-linear features with multiple inputs is to use a generalized additive model as\n",
    "$$\n",
    "g\\left[\\mu(\\vec{x}\\right]=w_0+\\sum_{m=1}^M\\phi_m(x_m)\n",
    "$$\n",
    "in which $x_m$ is the $m$-th feature of the inputs and $g$ is the link function; the $\\phi_m$ are unspecified smooth(\"nonparametric\") functions. This is a specific form of Adaptive basis function models, in which each non-linear features are only depended on one raw feature.\n",
    "\n",
    "Recall the logistic regression model for binary data. We relate the mean of the binary response $\\mu(\\vec{x})=P(y=1|\\vec{x})$ to the feature via a linear regression model and the logit link function\n",
    "$$\n",
    "log\\left(\\frac{\\mu(\\vec{x})}{1-\\mu(\\vec{x})}\\right)=w_0+\\sum_{m=1}^Mw_mx_m\n",
    "$$\n",
    "The additive logistic regression model replace each linear term by a more general functional form\n",
    "$$\n",
    "log\\left(\\frac{\\mu(\\vec{x})}{1-\\mu(\\vec{x})}\\right)=w_0+\\sum_{m=1}^M\\phi_m(x_m)\n",
    "$$\n",
    "\n",
    "The functions $\\phi_m$ are estimated in a flexible manner, using an algorithm whose basic building block is a [scatterplot smoother](https://en.wikipedia.org/wiki/Scatterplot_smoothing)\n",
    "# 2. Fitting Additive Models\n",
    "We will take additive regression model as example to introduce the fitting method of additive models. The additive regression model has the form\n",
    "$$\n",
    "y=w_0+\\sum_{m=1}^M\\phi_m(x_m)+\\epsilon\n",
    "$$\n",
    "Given training dataset $\\{\\vec{x}_i,y_i\\}_{i=1}^N$, a loss function can be specified for this problem as below\n",
    "$$\n",
    "L(w_0,\\phi_1,\\phi_2,\\cdots,\\phi_M)=\\sum_{i=1}^N\\left(y_i-w_0-\\sum_{m=1}^M\\phi_m(x_{im})\\right)^2\n",
    "+\\sum_{m=1}^M\\lambda_m\\int \\phi_m^{''}(t_j)^2dt_j\n",
    "$$\n",
    "where $\\lambda_m$ are strength of the regularizer for $\\phi_m$.\n",
    "\n",
    "The constant $w_0$ is not uniquely identifiable, since we can always add or subtract constants to any of the $\\phi_m$ functions. The convention is to assume $\\sum_{i=1}^N\\phi_m(x_{im})=0$ for all $m$ and $\\hat{w}_0=\\frac{1}{N}\\sum_{i=1}^Ny_i$\n",
    "\n",
    "To fit the rest of the model, we can center the response and then iteratively update each $\\phi_m$ in turn, using as a target vector the residuals obtained by omitting term $\\phi_m$.\n",
    "$$\n",
    "\\hat{\\phi}_m=smooother\\left(\\left\\{y_i-\\sum_{k\\not=m}\\hat{\\phi}_k(x_{ik})\\right\\}_{i=1}^N\\right)\n",
    "$$\n",
    "We should then ensure the output is zero mean using\n",
    "$$\n",
    "\\hat{\\phi}_m=\\hat{\\phi}_m-\\frac{1}{N}\\sum_{i=1}^N\\hat{\\phi}_m(x_{im})\n",
    "$$\n",
    "This is called the **backfitting algorithm**.\n",
    "# 3. Multivariate adaptive regression splines (MARS)\n",
    "We can extend additive models by allowing for interaction effects as follows\n",
    "$$\n",
    "g\\left[\\mu(\\vec{x}\\right]=w_0+\\sum_{m=1}^M\\phi_m(x_m)+\\sum_{m,n}\\phi_{mn}(x_m,x_n)+\\cdots\n",
    "$$\n",
    "Of course, we cannot allow for too many higher-order interactions, because there will be too many parameters to fit.It is common to use greedy search to decide which variables to add. However, there are too many candidate interactions which make it hard to use in real applications."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [default]",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
